NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Semi-supervised Vertex Hunting, with Applications in Network and Text Analysis

Jiang, Yicong; Ke, Zheng Tracy (December 2025, Conference and Workshop on Neural Information Processing Systems)

Free, publicly-accessible full text available December 20, 2026
Network Goodness-of-Fit for the Block-Model Family

https://doi.org/10.1080/01621459.2025.2479242

Jin, Jiashun; Ke, Zheng Tracy; Tang, Jiajun; Wang, Jingming (July 2025, Journal of the American Statistical Association)

Free, publicly-accessible full text available July 3, 2026
Optimal Network Pairwise Comparison

https://doi.org/10.1080/01621459.2024.2393471

Jin, Jiashun; Ke, Zheng Tracy; Luo, Shengming; Ma, Yucong (April 2025, Journal of the American Statistical Association)

Free, publicly-accessible full text available April 3, 2026
Optimal Network Membership Estimation under Severe Degree Heterogeneity

https://doi.org/10.1080/01621459.2024.2388903

Ke, Zheng Tracy; Wang, Jingming (April 2025, Journal of the American Statistical Association)

Free, publicly-accessible full text available April 3, 2026
Yicong Jiang and Zheng Tracy Ke’s Contribution to the Discussion of “Root and community inference on the la- tent growth process of a network” by Harry Crane and Min Xu

https://doi.org/10.1093/jrsssb/qkae048

Jiang, Yicong; Ke, Zheng Tracy (June 2024, Journal of the Royal Statistical Society Series B: Statistical Methodology)

Full Text Available
Entry-Wise Eigenvector Analysis and Improved Rates for Topic Modeling on Short Documents

https://doi.org/10.3390/math12111682

Ke, Zheng Tracy; Wang, Jingming (June 2024, Mathematics)

Topic modeling is a widely utilized tool in text analysis. We investigate the optimal rate for estimating a topic model. Specifically, we consider a scenario with n documents, a vocabulary of size p, and document lengths at the order N. When N≥c·p, referred to as the long-document case, the optimal rate is established in the literature at p/(Nn). However, when N=o(p), referred to as the short-document case, the optimal rate remains unknown. In this paper, we first provide new entry-wise large-deviation bounds for the empirical singular vectors of a topic model. We then apply these bounds to improve the error rate of a spectral algorithm, Topic-SCORE. Finally, by comparing the improved error rate with the minimax lower bound, we conclude that the optimal rate is still p/(Nn) in the short-document case.
more » « less
Full Text Available
Recent Advances in Text Analysis

https://doi.org/10.1146/annurev-statistics-040522-022138

Ke, Zheng Tracy; Ji, Pengsheng; Jin, Jiashun; Li, Wanshan (April 2024, Annual Review of Statistics and Its Application)

Text analysis is an interesting research area in data science and has various applications, such as in artificial intelligence, biomedical research, and engineering. We review popular methods for text analysis, ranging from topic modeling to the recent neural language models. In particular, we review Topic-SCORE, a statistical approach to topic modeling, and discuss how to use it to analyze the Multi-Attribute Data Set on Statisticians (MADStat), a data set on statistical publications that we collected and cleaned. The application of Topic-SCORE and other methods to MADStat leads to interesting findings. For example, we identified 11 representative topics in statistics. For each journal, the evolution of topic weights over time can be visualized, and these results are used to analyze the trends in statistical research. In particular, we propose a new statistical model for ranking the citation impacts of 11 topics, and we also build a cross-topic citation graph to illustrate how research results on different topics spread to one another. The results on MADStat provide a data-driven picture of the statistical research from 1975 to 2015, from a text analysis perspective.
more » « less
Full Text Available
Improved algorithm and bounds for successive projection

Jin, Jiashun; Moryoussef, Gabriel; Ke, Zheng Tracy; Tang, Jiajun; Wang, Jingming (May 2024, International Conference on learning and representations)

Full Text Available
Testing high-dimensional multinomials with applications to text analysis

https://doi.org/10.1093/jrsssb/qkae003

Cai, T Tony; Ke, Zheng T; Turner, Paxton (February 2024, Journal of the Royal Statistical Society Series B: Statistical Methodology)

Abstract Motivated by applications in text mining and discrete distribution inference, we test for equality of probability mass functions of K groups of high-dimensional multinomial distributions. Special cases of this problem include global testing for topic models, two-sample testing in authorship attribution, and closeness testing for discrete distributions. A test statistic, which is shown to have an asymptotic standard normal distribution under the null hypothesis, is proposed. This parameter-free limiting null distribution holds true without requiring identical multinomial parameters within each group or equal group sizes. The optimal detection boundary for this testing problem is established, and the proposed test is shown to achieve this optimal detection boundary across the entire parameter space of interest. The proposed method is demonstrated in simulation studies and applied to analyse two real-world datasets to examine, respectively, variation among customer reviews of Amazon movies and the diversity of statistical paper abstracts.
more » « less
Full Text Available
Mixed membership estimation for social networks

https://doi.org/10.1016/j.jeconom.2022.12.003

Jin, Jiashun; Ke, Zheng Tracy; Luo, Shengming (February 2024, Journal of Econometrics)

Full Text Available

« Prev Next »

Search for: All records